import pandas as pd
                import numpy as np
                import matplotlib.pyplot as plt
                import seaborn as sns
                import os


                os.chdir("C:\\Users\\ASUS\\Desktop")
                car_df = pd.read_csv('Car_Purchasing_Data.csv', encoding='ISO-8859-1')
                car_df.head()


                car_df.info()

<class 'pandas.core.frame.DataFrame'>
                RangeIndex: 500 entries, 0 to 499
                Data columns (total 9 columns):
                 #   Column               Non-Null Count  Dtype  
                ---  ------               --------------  -----  
                 0   Customer Name        500 non-null    object 
                 1   Customer e-mail      500 non-null    object 
                 2   Country              500 non-null    object 
                 3   Gender               500 non-null    int64  
                 4   Age                  500 non-null    float64
                 5   Annual Salary        500 non-null    float64
                 6   Credit Card Debt     500 non-null    float64
                 7   Net Worth            500 non-null    float64
                 8   Car Purchase Amount  500 non-null    float64
                dtypes: float64(5), int64(1), object(3)
                memory usage: 35.3+ KB


                n = car_df.nunique(axis=0)
                n

Customer Name          498
                Customer e-mail        500
                Country                211
                Gender                   2
                Age                    500
                Annual Salary          500
                Credit Card Debt       500
                Net Worth              500
                Car Purchase Amount    500
                dtype: int64


                car_df = car_df.drop(['Customer Name', 'Customer e-mail', 'Country'], axis = 1)
                car_df.head(2)


                import pandas as pd
                import matplotlib.pyplot as plt
                import random
                import numpy as np
                import seaborn as sns
                import warnings
                warnings.simplefilter(action='ignore', category=FutureWarning)
                
                
                corr = car_df.corr()
                thresh = 0
                kot = corr[((corr>=thresh) | (corr<= -thresh))& (corr != 1)]
                plt.figure(figsize=(10,3))
                sns.heatmap(kot, cmap="Reds",annot=True)

<AxesSubplot:>


                # We remove the label values from our training data
                X = car_df.drop(['Car Purchase Amount'],axis=1)
                
                # We assigned those label values to our Y dataset
                y = car_df['Car Purchase Amount']


                # Split it to a 70:30 Ratio Train:Test
                from sklearn.model_selection import train_test_split
                X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=42)


                # Importing the Keras libraries and packages
                import tensorflow as tf


                ### Initializing the ANN
                ann = tf.keras.models.Sequential()
                ### Adding the input layer and the first hidden layer"""
                ann.add(tf.keras.layers.Dense(units=6, activation='relu'))
                ann.add(tf.keras.layers.Dense(units=6, activation='relu'))
                ### Adding the output layer"""#only 1 output hence 1 layer
                ann.add(tf.keras.layers.Dense(units=1))
                ### Compiling the ANN
                ann.compile(optimizer = 'adam', loss = 'mean_squared_error')


                ### Training the ANN model on the Training set"""
                # first convert it into a numpy array since list doesn't work
                import numpy as np
                y_train = np.array(y_train)
                X_train = np.array(X_train)
                ann.fit(X_train, y_train, batch_size = 25, epochs = 100)

Epoch 1/100
                14/14 [==============================] - 0s 846us/step - loss: 2254526208.0000
                Epoch 2/100
                14/14 [==============================] - 0s 769us/step - loss: 874682560.0000
                Epoch 3/100
                14/14 [==============================] - 0s 1ms/step - loss: 336855360.0000
                Epoch 4/100
                14/14 [==============================] - 0s 1ms/step - loss: 183744560.0000
                Epoch 5/100
                14/14 [==============================] - 0s 769us/step - loss: 132019112.0000
                Epoch 6/100
                14/14 [==============================] - 0s 923us/step - loss: 112405392.0000
                Epoch 7/100
                14/14 [==============================] - 0s 769us/step - loss: 94877496.0000
                Epoch 8/100
                14/14 [==============================] - 0s 846us/step - loss: 80443808.0000
                Epoch 9/100
                14/14 [==============================] - 0s 846us/step - loss: 69199248.0000
                Epoch 10/100
                14/14 [==============================] - 0s 923us/step - loss: 61142228.0000
                Epoch 11/100
                14/14 [==============================] - 0s 923us/step - loss: 56840748.0000
                Epoch 12/100
                14/14 [==============================] - 0s 769us/step - loss: 55082956.0000
                Epoch 13/100
                14/14 [==============================] - 0s 846us/step - loss: 54673260.0000
                Epoch 14/100
                14/14 [==============================] - 0s 769us/step - loss: 54076820.0000
                Epoch 15/100
                14/14 [==============================] - 0s 923us/step - loss: 53065920.0000
                Epoch 16/100
                14/14 [==============================] - 0s 846us/step - loss: 51914512.0000
                Epoch 17/100
                14/14 [==============================] - 0s 923us/step - loss: 51449984.0000
                Epoch 18/100
                14/14 [==============================] - 0s 846us/step - loss: 51118772.0000
                Epoch 19/100
                14/14 [==============================] - 0s 923us/step - loss: 50901304.0000
                Epoch 20/100
                14/14 [==============================] - 0s 692us/step - loss: 51026604.0000
                Epoch 21/100
                14/14 [==============================] - 0s 923us/step - loss: 50234812.0000
                Epoch 22/100
                14/14 [==============================] - 0s 923us/step - loss: 49995952.0000
                Epoch 23/100
                14/14 [==============================] - 0s 846us/step - loss: 49874324.0000
                Epoch 24/100
                14/14 [==============================] - 0s 846us/step - loss: 49665704.0000
                Epoch 25/100
                14/14 [==============================] - 0s 769us/step - loss: 49508592.0000
                Epoch 26/100
                14/14 [==============================] - 0s 769us/step - loss: 49688080.0000
                Epoch 27/100
                14/14 [==============================] - 0s 923us/step - loss: 49259988.0000
                Epoch 28/100
                14/14 [==============================] - 0s 923us/step - loss: 48994604.0000
                Epoch 29/100
                14/14 [==============================] - 0s 923us/step - loss: 49293124.0000
                Epoch 30/100
                14/14 [==============================] - 0s 923us/step - loss: 48767880.0000
                Epoch 31/100
                14/14 [==============================] - 0s 769us/step - loss: 48721324.0000
                Epoch 32/100
                14/14 [==============================] - 0s 923us/step - loss: 48790360.0000
                Epoch 33/100
                14/14 [==============================] - 0s 769us/step - loss: 48626628.0000
                Epoch 34/100
                14/14 [==============================] - 0s 846us/step - loss: 48443716.0000
                Epoch 35/100
                14/14 [==============================] - 0s 846us/step - loss: 48344956.0000
                Epoch 36/100
                14/14 [==============================] - 0s 846us/step - loss: 48129692.0000
                Epoch 37/100
                14/14 [==============================] - 0s 846us/step - loss: 48067148.0000
                Epoch 38/100
                14/14 [==============================] - 0s 769us/step - loss: 47960180.0000
                Epoch 39/100
                14/14 [==============================] - 0s 769us/step - loss: 48106928.0000
                Epoch 40/100
                14/14 [==============================] - 0s 3ms/step - loss: 48997680.0000
                Epoch 41/100
                14/14 [==============================] - 0s 923us/step - loss: 47792044.0000
                Epoch 42/100
                14/14 [==============================] - 0s 769us/step - loss: 47327528.0000
                Epoch 43/100
                14/14 [==============================] - 0s 769us/step - loss: 47432068.0000
                Epoch 44/100
                14/14 [==============================] - 0s 846us/step - loss: 47894848.0000
                Epoch 45/100
                14/14 [==============================] - ETA: 0s - loss: 37502864.00 - 0s 923us/step - loss: 47180852.0000
                Epoch 46/100
                14/14 [==============================] - 0s 846us/step - loss: 47228808.0000
                Epoch 47/100
                14/14 [==============================] - 0s 769us/step - loss: 47633572.0000
                Epoch 48/100
                14/14 [==============================] - 0s 769us/step - loss: 48302672.0000
                Epoch 49/100
                14/14 [==============================] - 0s 923us/step - loss: 47130052.0000
                Epoch 50/100
                14/14 [==============================] - 0s 923us/step - loss: 47047244.0000
                Epoch 51/100
                14/14 [==============================] - 0s 846us/step - loss: 47380172.0000
                Epoch 52/100
                14/14 [==============================] - 0s 923us/step - loss: 47480752.0000
                Epoch 53/100
                14/14 [==============================] - 0s 923us/step - loss: 47162876.0000
                Epoch 54/100
                14/14 [==============================] - 0s 769us/step - loss: 47209596.0000
                Epoch 55/100
                14/14 [==============================] - 0s 692us/step - loss: 47092436.0000
                Epoch 56/100
                14/14 [==============================] - 0s 769us/step - loss: 46708604.0000
                Epoch 57/100
                14/14 [==============================] - 0s 846us/step - loss: 47128296.0000
                Epoch 58/100
                14/14 [==============================] - 0s 923us/step - loss: 47467568.0000
                Epoch 59/100
                14/14 [==============================] - 0s 769us/step - loss: 47212180.0000
                Epoch 60/100
                14/14 [==============================] - 0s 692us/step - loss: 46686548.0000
                Epoch 61/100
                14/14 [==============================] - 0s 769us/step - loss: 46814136.0000
                Epoch 62/100
                14/14 [==============================] - 0s 923us/step - loss: 46909644.0000
                Epoch 63/100
                14/14 [==============================] - 0s 846us/step - loss: 46811908.0000
                Epoch 64/100
                14/14 [==============================] - 0s 846us/step - loss: 46746632.0000
                Epoch 65/100
                14/14 [==============================] - 0s 769us/step - loss: 46422720.0000
                Epoch 66/100
                14/14 [==============================] - 0s 769us/step - loss: 46900496.0000
                Epoch 67/100
                14/14 [==============================] - 0s 769us/step - loss: 46667608.0000
                Epoch 68/100
                14/14 [==============================] - 0s 846us/step - loss: 46425048.0000
                Epoch 69/100
                14/14 [==============================] - 0s 769us/step - loss: 46460028.0000
                Epoch 70/100
                14/14 [==============================] - 0s 769us/step - loss: 46731956.0000
                Epoch 71/100
                14/14 [==============================] - 0s 846us/step - loss: 46738132.0000
                Epoch 72/100
                14/14 [==============================] - 0s 846us/step - loss: 46763880.0000
                Epoch 73/100
                14/14 [==============================] - 0s 846us/step - loss: 46602308.0000
                Epoch 74/100
                14/14 [==============================] - 0s 769us/step - loss: 46635252.0000
                Epoch 75/100
                14/14 [==============================] - 0s 769us/step - loss: 46278932.0000
                Epoch 76/100
                14/14 [==============================] - 0s 846us/step - loss: 47537420.0000
                Epoch 77/100
                14/14 [==============================] - 0s 923us/step - loss: 47229988.0000
                Epoch 78/100
                14/14 [==============================] - 0s 846us/step - loss: 47287080.0000
                Epoch 79/100
                14/14 [==============================] - 0s 846us/step - loss: 46228960.0000
                Epoch 80/100
                14/14 [==============================] - 0s 846us/step - loss: 46520068.0000
                Epoch 81/100
                14/14 [==============================] - 0s 846us/step - loss: 46385520.0000
                Epoch 82/100
                14/14 [==============================] - 0s 846us/step - loss: 46577500.0000
                Epoch 83/100
                14/14 [==============================] - 0s 923us/step - loss: 46334468.0000
                Epoch 84/100
                14/14 [==============================] - 0s 923us/step - loss: 46291264.0000
                Epoch 85/100
                14/14 [==============================] - 0s 846us/step - loss: 45952480.0000
                Epoch 86/100
                14/14 [==============================] - 0s 769us/step - loss: 46196100.0000
                Epoch 87/100
                14/14 [==============================] - 0s 846us/step - loss: 46544616.0000
                Epoch 88/100
                14/14 [==============================] - 0s 846us/step - loss: 45891492.0000
                Epoch 89/100
                14/14 [==============================] - 0s 923us/step - loss: 46436244.0000
                Epoch 90/100
                14/14 [==============================] - 0s 846us/step - loss: 46673612.0000
                Epoch 91/100
                14/14 [==============================] - 0s 769us/step - loss: 46148052.0000
                Epoch 92/100
                14/14 [==============================] - 0s 769us/step - loss: 46131000.0000
                Epoch 93/100
                14/14 [==============================] - 0s 846us/step - loss: 46123468.0000
                Epoch 94/100
                14/14 [==============================] - 0s 923us/step - loss: 46411020.0000
                Epoch 95/100
                14/14 [==============================] - 0s 846us/step - loss: 46318272.0000
                Epoch 96/100
                14/14 [==============================] - 0s 769us/step - loss: 46558356.0000
                Epoch 97/100
                14/14 [==============================] - 0s 692us/step - loss: 46144964.0000
                Epoch 98/100
                14/14 [==============================] - 0s 769us/step - loss: 45992452.0000
                Epoch 99/100
                14/14 [==============================] - 0s 769us/step - loss: 46097556.0000
                Epoch 100/100
                14/14 [==============================] - 0s 692us/step - loss: 46090004.0000

<keras.callbacks.History at 0x248029e8>


                y_pred = ann.predict(X_test)


                y_test = y_test.tolist()
                d = pd.DataFrame()
                d["y_test"] = y_test
                d["y_pred"] = y_pred


                # MAPE
                d["mp"] = (abs(d["y_test"]- d["y_pred"]))/d["y_test"]
                (d.mp.mean())*100

13.022884808054105


                # Importing the XGB libraries and packages
                import xgboost as xg


                # Instantiation
                xgb_r = xg.XGBRegressor(objective ='reg:linear',n_estimators = 100, seed = 123)


                # Fitting the model
                xgb_r.fit(X_train, y_train)

[09:36:09] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/objective/regression_obj.cu:188: reg:linear is now deprecated in favor of reg:squarederror.

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
                             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
                             gamma=0, gpu_id=-1, importance_type=None,
                             interaction_constraints='', learning_rate=0.300000012,
                             max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
                             monotone_constraints='()', n_estimators=100, n_jobs=8,
                             num_parallel_tree=1, objective='reg:linear', predictor='auto',
                             random_state=123, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
                             seed=123, subsample=1, tree_method='exact', validate_parameters=1,
                             verbosity=None)


                y_pred = xgb_r.predict(X_test)


                d = pd.DataFrame()
                d["y_test"] = y_test
                d["y_pred"] = y_pred


                # MAPE
                d["mp"] = abs((d["y_test"]- d["y_pred"])/d["y_test"])
                (d.mp.mean())*100

3.632458679991241

	Customer Name	Customer e-mail	Country	Gender	Age	Annual Salary	Credit Card Debt	Net Worth	Car Purchase Amount
0	Martina Avila	cubilia.Curae.Phasellus@quisaccumsanconvallis.edu	Bulgaria	0	41.851720	62812.09301	11609.380910	238961.2505	35321.45877
1	Harlan Barnes	eu.dolor@diam.co.uk	Belize	0	40.870623	66646.89292	9572.957136	530973.9078	45115.52566
2	Naomi Rodriquez	vulputate.mauris.sagittis@ametconsectetueradip...	Algeria	1	43.152897	53798.55112	11160.355060	638467.1773	42925.70921
3	Jade Cunningham	malesuada@dignissim.com	Cook Islands	1	58.271369	79370.03798	14426.164850	548599.0524	67422.36313
4	Cedric Leach	felis.ullamcorper.viverra@egetmollislectus.net	Brazil	1	57.313749	59729.15130	5358.712177	560304.0671	55915.46248

	Gender	Age	Annual Salary	Credit Card Debt	Net Worth	Car Purchase Amount
0	0	41.851720	62812.09301	11609.380910	238961.2505	35321.45877
1	0	40.870623	66646.89292	9572.957136	530973.9078	45115.52566

Business Problem¶

Methodology¶

Artificial Neural Network¶

An Introduction to Artificial Neural Network¶

How Artificial Neural Network works¶

eXtreme Gradient Boosting¶

An Introduction to eXtreme Gradient Boosting¶

How eXtreme Gradient Boosting works¶

We will import the packages¶

We will import the dataset¶

About the data¶

We will have a rough understanding of the data using info function¶

We will now check the number of unique values per variables¶

QUICK TIP: In order to create a dummy variable, we should have at the most 10-15 unique values in it.¶

Dropping the variables¶

Now we are checking if there are any independent variables with high correlation among each other¶

We can see that none of the variables are having high correlation¶

Data split¶

Segregating the independent variables as X and dependent variable as y¶

Now we will split the data into training (70% of the data) and rest 30% - named test, will be kept aside for later use.¶

Now let's Train an Artificial Neural Network Model¶

We will first initialize the ANN and then add some hidden layers into it.¶

Once the model is executed, we will predict the test data with our model.¶

EVALUATING THE MODEL by calculating MAPE¶

We are getting a MAPE of 13% using Artificial Neural Network.¶

Now let's Train an XGBoost Model¶

We will first initialize the ANN and then add some hidden layers into it.¶

Once the model is executed, we will predict the test data with our model.¶

EVALUATING THE MODEL by calculating MAPE¶

We are getting a MAPE of 3.6% using XGBoost¶

We can clearly see the XGBoost model has outperformed Artificial Neural Network. However, it certainly doesn't mean that for all the datasets we are going to get the same result.¶

As a future scope of work, we might have used Hyper-parameter tuning and could have tried to further increase the accuracy of XGBoost model.¶

We provide a 100% money back guarantee on learning. It means that each and every student of analytics educator will be able to understand every line of codes and algorithm, else we will refund the money back.¶